Mathematical Modeling to Predict COVID-19 Infection and Vaccination Trends

Background: COVID-19 caused by the Severe Acute Respiratory Syndrome Coronavirus 2 placed the health systems around the entire world in a battle against the clock. While most of the existing studies aimed at forecasting the infections trends, our study focuses on vaccination trend(s). Material and methods: Based on these considerations, we used standard analyses and ARIMA modeling to predict possible scenarios in Romania, the second-lowest country regarding vaccinations from the entire European Union. Results: With approximately 16 million doses of vaccine against COVID-19 administered, 7,791,250 individuals had completed the vaccination scheme. From the total, 5,058,908 choose Pfizer–BioNTech, 399,327 Moderna, 419,037 AstraZeneca, and 1,913,978 Johnson & Johnson. With a cumulative 2147 local and 17,542 general adverse reactions, the most numerous were reported in recipients of Pfizer–BioNTech (1581 vs. 8451), followed by AstraZeneca (138 vs. 6033), Moderna (332 vs. 1936), and Johnson & Johnson (96 vs. 1122). On three distinct occasions have been reported >50,000 individuals who received the first or second dose of a vaccine and >30,000 of a booster dose in a single day. Due to high reactogenicity in case of AZD1222, and time of launching between the Pfizer–BioNTech and Moderna vaccine could be explained differences in terms doses administered. Furthermore, ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(0,2,0), ARIMA(2,1,0), ARIMA(1,2,2), ARI-MA(2,2,2), ARIMA(0,2,2), ARIMA(2,2,2), ARIMA(1,1,2), ARIMA(2,2,2), ARIMA(2,1,1), ARIMA(2,2,1), and ARIMA (2,0,2) for all twelve months and in total fitted the best models. These were regarded according to the lowest MAPE, p-value (p < 0.05, p < 0.01, and p < 0.001) and through the Ljung–Box test (p < 0.05, p < 0.01, and p < 0.001) for autocorrelations. Conclusions: Statistical modeling and mathematical analyses are suitable not only for forecasting the infection trends but the course of a vaccination rate as well.


Introduction
Even though the number of human coronavirus infections per year is low [1], the current health crisis caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has placed each country worldwide in an unprecedented state of emergency [2]. Originally was theorized to be another low-level respiratory outbreak similar to the previous health crises [3][4][5][6] and pathophysiology analogies with SARS-CoV-2 [7].
It led initially to a relatively insignificant number of pneumonia cases [2] with an unknown etiology. Additional experiments brought new data and insight regarding the emergence of a novel beta-strain that belongs to the zoononic coronavirus (CoV) that is particularly virulent to humans [8]. Until the genome of this pathogen was fully sequenced, the number of cases confirmed already reached 15 with 1 fatality [9].
With fifty-nine confirmed cases at the beginning of 2020 [10], it became the main priority reflected by the associated mortality rate and high tropism towards the respiratory

Data Collection and Parameters Analyzed
Data related to (1) the number of doses administered (dose 1, dose 2, and booster), (2) those that had or not completed the vaccination scheme, and (3) adverse reactions of local or general type were taken from the official website of the Romanian Government (https://vaccinare-covid.gov.ro/ (accessed on 28 December 2021)). We centralized all data by creating a time-series database using MS Excel for the following predetermined interval (27 December 2020-27 December 2021).

ARIMA
We divided the interval into short subdivisions with a forecast of 7 days. The error rate is dependent on the period forecasted based on our previous experience [23]. Equations, variants, model selection, and parameters are presented in former studies conducted by our team [23,24].

Statistical Analyses
Data analysis for three out of four parameters was carried out using Microsoft Excel 2010, whereas the software used for ARIMA modeling is STATGRAPHICS Centurion (18.1.14).

Results
We observe that a significant percentage of the Romanian citizens were willing to receive the serum from Pfizer-BioNTech. Following the centralization and analysis of data, we noted a fluctuating trend that lasted several months before reaching the peak on 21st April 2021. With 55,643 (SE-724.01, SD-13,851.18, CI95%-1423.76) individuals immunized in a single day, we expected an increase since the second dose became available starting with 3 of 20 17 January 2021. Precisely, 11 May 2021 was the day in which was registered the second most numerous groups with 54,757 (SE-732.40, SD-13,603.80, CI95%-1440.55). Unfortunately, this trend gradually decreased again from May until mid-October. After this decline, the phase normalized with 2 successful waves of recipients and a maximum of 35,425 (SE-691.87, SD-6600.03, CI95%-1374.52) recipients of booster on 29 September 2021. From that point onwards, all three trends significantly decrease, this possibly having to do with the relaxation of restrictions in our country. Even though the technology behind manufacturing is the same as BNT162b1/BNT162b2, the Moderna vaccine did not benefit from the same interest among citizens. With figures suggesting that approximately 15.27% of recipients have chosen mRNA-1273 by comparison with Pfizer-BioNTech, on 5 February 2021, 8501 (SE-107.58, SD-1930.60, CI95%-211.66) people received the first dose. The situation remained the same also towards the introduction of the second dose on 12 Figure 1). ta, we noted a fluctuating trend that lasted several months before reaching the peak on 21st April 2021. With 55,643 (SE-724.01, SD-13,851.18, CI95%-1423.76) individuals immunized in a single day, we expected an increase since the second dose became available starting with 17 January 2021. Precisely, 11 May 2021 was the day in which was registered the second most numerous groups with 54,757 (SE-732.40, SD-13,603.80, CI95%-1440.55). Unfortunately, this trend gradually decreased again from May until mid-October. After this decline, the phase normalized with 2 successful waves of recipients and a maximum of 35    At the end of the analyzed interval, a total of 7,791,250 individuals were completely immunized. From this, 64.93%, n = 5.058.908 with BNT162b1/BNT162b2, and 24.59%, n = 1,913,978 with Ad26.COV2-S. Only a small percentage selected mRNA-1273 or ChA-dOx1 (10.49% n = 818,364). As observed, both reached a plateau phase, this highlighting the human reluctance as a consequence of high reactogenicity. In this context, only 5.12%, n = 399,327 and 5.37%, n = 419,037 were attributed to the last two COVID-19 vaccines ( Figure 2). At the end of the analyzed interval, a total of 7,791,250 individuals were completely immunized. From this, 64.93%, n = 5.058.908 with BNT162b1/BNT162b2, and 24.59%, n = 1,913,978 with Ad26.COV2-S. Only a small percentage selected mRNA-1273 or ChAdOx1 (10.49% n = 818,364). As observed, both reached a plateau phase, this highlighting the human reluctance as a consequence of high reactogenicity. In this context, only 5.12%, n = 399,327 and 5.37%, n = 419,037 were attributed to the last two COVID-19 vaccines (Figure 2).  There was a cumulative total of 2147 local adverse reactions and 17,542 general. Individually, 73.63%, n = 1581 were local (max = 56 on 5 January 2021) and 48.17%, n = 8451 (max = 170 on 6 February 2021) generalized in the case of BNT162b1/BNT162b2 followed by ChAdOx1 with 6.42%, n = 138 (max = 10 on 16 March 2021) and 34.39%, n = 6033 (max = 232 on 13th March 2021). Despite this, the Romanian citizens still opted for Pfizer-BioNTech. Through the prism of one critical argument might be explained this situation: the number of cases ≥100 on twenty-two different occasions, hence the lack of confidence in the serum from AstraZeneca. Continuing with this concept, 15.46%, n = 332 (max = 11 on 6 April 2021), whereas 11.03%, n = 1936 (max = 53 on 22 February 2021) of all adverse reactions were caused by Moderna. In more than eight months since its release, only 4.47% of the cases, n = 96 (max = 5 on 5 June 2021) led to local adverse reactions and 6.39%, n = 1122 (max = 45 on 26 October 2021) to those of generalized reported for Johnson & Johnson ( Figure 3).
According to Elevli et al. [25], to successfully create any ARIMA model, it must first meet four conditions and evaluate if a series of values are constant throughout the analyzed period. Autocorrelation Function (ACF), and Partial Autocorrelation Function (PACF) (Figure 4), are the time-series plots developed to evaluate the seasonality and stationarity. ACF is a metric that describes if the previous values are related to the next ones, whereas PACF determines the degree of correlation coefficient between variable and lag [26]. The performance of the model and misspecification detection is measured through the Bayesian information criterion of Schwarz (BIC), and Akaike information criteria expression (AIC) [27]. Straight lines points to the limit of two standard deviations, while the bars that cross the lines indicate statistically meaningful autocorrelations ( Figure 4). According to Elevli et al. [25], to successfully create any ARIMA model, it must first meet four conditions and evaluate if a series of values are constant throughout the analyzed period. Autocorrelation Function (ACF), and Partial Autocorrelation Function (PACF) (Figure 4), are the time-series plots developed to evaluate the seasonality and stationarity. ACF is a metric that describes if the previous values are related to the next ones, whereas PACF determines the degree of correlation coefficient between variable and lag [26]. The performance of the model and misspecification detection is measured through the Bayesian information criterion of Schwarz (BIC), and Akaike information criteria expression (AIC) [27]. Straight lines points to the limit of two standard deviations, while the bars that cross the lines indicate statistically meaningful autocorrelations ( Figure 4).  Figure 4 (left and right) are displayed the associated plots for the estimated partial and autocorrelations between residuals at distinct lags. Specifically, the lag x, partial and autocorrelation, coefficients evaluate the affinity between the residuals at a time t and time (x − t) (for autocorrelation)/(x + t) (partial autocorrelations) at 95.0% probability to be close to 0. Distinct from autocorrelation is the condition that t + x accounts for the correlations at all lower lags, observation used to appreciate the order of autoregressive model where needed to fit the data. Valid for both functions if the probability at a specific for autocorrelation/particular for partial autocorrelation lag do not contain the estimated coefficient, indeed exists a statistically significant correlation at that lag at CI 95.0%.
Performances of multiple models were generated and interpreted. MAPE with the lowest value per statistical analysis was regarded as the best model. Among all models, ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(0,2,0), ARIMA (2,1,0), ARIMA(1,2,2), ARI-MA(2,2,2), ARIMA(0,2,2), ARIMA(2,2,2), ARIMA(1,1,2), ARIMA(2,2,2), ARIMA(2,1,1), ARIMA(2,2,1), and ARIMA(2,0,2) were chosen for all twelve months and total. The fitted models are presented in Figure 4 and Tables 1 and 2 Figure 4. The estimated ACF and PACF graphs used to predict the vaccination trend against COVID-19 in Romania per each month and cumulative. In this figure (left and right) are displayed the associated plots for the estimated partial and autocorrelations between residuals at distinct lags. Specifically, the lag x, partial and autocorrelation, coefficients evaluate the affinity between the residuals at a time t and time (x − t) (for autocorrelation)/(x + t) (partial autocorrelations) at 95.0% probability to be close to 0. Distinct from autocorrelation is the condition that t + x accounts for the correlations at all lower lags, observation used to appreciate the order of autoregressive model where needed to fit the data. Valid for both functions if the probability at a specific for autocorrelation/particular for partial autocorrelation lag do not contain the estimated coefficient, indeed exists a statistically significant correlation at that lag at CI 95.0%.
There will be an increased interest in vaccines between 5-10% in the first quarter of 2022. The short-term forecasts apply for influenza, HPV, pneumococcal, and polio vaccines, without being detected a decline in the overall interest for the COVID-19 vaccine [29]. However, confinement is still one of the most suitable prevention measures. The number of pediatric consultations/antenatal visits decreased by 52%/45% in April and 34%/34% in May 2020 compared to the same periods of 2019 (p = 0.0001), and demand for immunization significantly decreased as well [30].
Another nationwide article conducted is by Lumbreras-Marquez et al. [31], in which they briefly discuss the risk of morbidity and mortality among Mexican pregnant women. With 934 deaths in 2020, the maternal mortality ratio (MMR) was 46.6 per 100.00 live births, with 202 attributed to COVID-19. Around 31% (286/934) was associated with respiratory failure in contrast to 5% between 2011-2019 since the Mexican government launched the vaccination program on 11 May 2021. Assuming 100% vaccination among women, the authors forecasted weekly maternal deaths that might occur and obtained 993 deaths with an MMR of 46.5; RMSE (0,1,1) was 5.57 and 6.15 in 2021 compared to 2020 (21.6%). The overall figures would decrease to 885 and to an MMR of 41.5.
Distinct authors employed other mathematical models to perform predictive analyses in various countries [32,33]. Hwang [32] adopted a heterogeneous autoregression (HAR) model due to the long-memory feature of COVID-19 based on the growth and vaccination rates. Three novel perspectives derive from this protocol. The first refers to the combination of both growth and vaccination rates, construction and comparison of three types of predictions and coverage probability improvement, and mean interval score of prediction periods via bootstrap technique.
The Susceptible-Infected-Recovered (SIR) model [34] fits within the scope, with an expected effective reproduction term (tR) less than 1. According to a recent article, the tR reduces at fast rates when the values of tR are high, the slope being dependent on the promptness response parameter. A multinomial autoregressive model for time series of counts was introduced with the aim of analyzing the finite-range integer-valued data. For this, the estimations of the parameters were calculated using conditional least squares (CLS), weighted conditional least squares (WCLS), and conditional maximum likelihood (CML). The authors established the asymptotic properties of the estimators and performed simulation studies to certify the current procedure [35]. Bartolucci et al. [36] and proposed multinomial Bayesian and Dirichlet auto-regressive models for series of time-dependent data points centered on counting patients exclusive and exhaustive categorized on predefined groups. Specifically, they were allocated based on the severity and required treatments in either regular wards or intensive care units, along with individuals that passed away and went through the disease. Not only were formulated assumptions on the transition probabilities between categories over a specific period that previously had a normal distribution allowed, but the accessibility to incorporate hypotheses was offered. Markov chain Monte Carlo (MCMC) was employed to estimate the posterior distribution and transition matrices, also allowing to make predictions and compute the reproduction number (Rt), accuracy measured through Bayesian inference. In this way, the authors offer insight regarding data collection during the first wave in Lombardia, Italy, and the effect of non-pharmaceutical interventions. Furthermore, the Dirichlet-multinomial model is adequate in fitting/providing predictive performance for patients admitted in regular and intensive care units.

Conclusions
Our results emphasize the willingness of Romanian residents to get the vaccine. On the opposed pole, there is a significant discrepancy between the internal administration of Westernized countries with >50% of the overall population vaccinated and post-communist middle-class country such as Romania. Approximately 16 million doses have been administered since inception on 27 December 2020. BNT162b1/BNT162b2 and Ad26.COV2-S were the top two choices among the Romanian citizens, with figures comparable in contrast with mRNA-1273 and AZD1222 among all analyzed parameters. Statistical models still play a crucial role in making different predictions. As opposed to the existing literature, our study was focused on forecasting the vaccination rate, and not for establishing the infections trends. Following the analyses performed, ARIMA(1,1,0), ARIMA(1,1,1), ARIMA(0,2,0), ARIMA(2,1,0), ARIMA(1,2,2), ARI-MA(2,2,2), ARIMA(0,2,2), ARIMA(2,2,2), ARIMA(1,1,2), ARIMA(2,2,2), ARIMA(2,1,1), ARIMA(2,2,1), and ARIMA (2,0,2) are the best models that fit within the current situation from all scenarios generated based on their MAPE, p-value and through the Ljung-Box test. Conclusively, mathematical and statistical algorithms proved efficient in providing forecasts of either infectious or, in our case, vaccination trends in a country.